Model Selection for Part of Speech Tagging With Type Supervision
ثبت نشده
چکیده
Model selection (picking, for example, a parametric model family, a prior, and an estimation criterion) is crucial for building high-accuracy classifiers. In supervised learning settings, the accuracy of a model can be estimated on a labeled set and used to guide modeling decisions. In unsupervised or type-supervised learning settings, unsupervised model selection criteria are used, but their performance is far from optimal (Smith and Eisner, 2005). Here we propose a new model selection criterion for type-supervised sequence labeling settings, which uses the available weak supervision (type-level constraints) more directly to come up with an accuracylike metric. We evaluate the effectiveness of the method on type-supervised POS-tagging in nine languages, using both HMM and LDA-based models, and show that it outperforms two unsupervised model selection criteria.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملModel Selection for Type-Supervised Learning with Application to POS Tagging
Model selection (picking, for example, the feature set and the regularization strength) is crucial for building high-accuracy NLP models. In supervised learning, we can estimate the accuracy of a model on a subset of the labeled data and choose the model with the highest accuracy. In contrast, here we focus on type-supervised learning, which uses constraints over the possible labels for word ty...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملObservational Initialization of Type-Supervised Taggers
Recent work has sparked new interest in type-supervised part-of-speech tagging, a data setting in which no labeled sentences are available, but the set of allowed tags is known for each word type. This paper describes observational initialization, a novel technique for initializing EM when training a type-supervised HMM tagger. Our initializer allocates probability mass to unambiguous transitio...
متن کاملLearning a Part-of-Speech Tagger from Two Hours of Annotation
Most work on weakly-supervised learning for part-of-speech taggers has been based on unrealistic assumptions about the amount and quality of training data. For this paper, we attempt to create true low-resource scenarios by allowing a linguist just two hours to annotate data and evaluating on the languages Kinyarwanda and Malagasy. Given these severely limited amounts of either type supervision...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014